-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export fp8 te nemo to trt-llm #10096
Export fp8 te nemo to trt-llm #10096
Conversation
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
eaef189
to
4816a6e
Compare
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
4816a6e
to
042d325
Compare
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
…lasjan107/NeMo into export_fp8_te_nemo_to_trtllm
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please remove the parts that are not related to the fp8 ckpt support? I saw some of the values in the config removed. We are in the process of moving main parts of the export into mcore and some of the code optimizations were done in the mcore already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general most of the changes are very much well needed code cleanups that had needed to happen for a while.
@shanmugamr1992 is simultaneously working to port this code to megatron core, and he also has many changes to cleanup the code. As @oyilmaz-nvidia has mentioned, we should coordinate with him to see which of these are needed to land in this PR For instance he also refactors the giant messy weight name dict here.
Can we make sure these refactors do not break the NeMo-Aligner's existing code path? I believe the CI isn't running for aligner. cc @terrykong
@JimmyZhang12 I'm not sure I will be able to test this, but I'd like to try to manually test the NeMo-Aligner TRTLLM integration with this PR. It looks like this PR is rooted after @oyilmaz-nvidia upgraded to v11, so I can try a one-off build of (TRTLLM v10) + (this Nemo PR) + (mcore ToT) and hopefully with a few hacks I can validate |
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking until I can verify aligner doesn't obviously break (ETA this week)
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
…lasjan107/NeMo into export_fp8_te_nemo_to_trtllm
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a smoke test, the Aligner code path looks okay
* initial commit Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * PR draft Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * fixed scaling weights Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * fixed zarr loading, added flags, refactor Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * fix expert key mapping Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * refactor Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * fix: failed test was finishing with exit code 0 Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * test commit -- rerun github checks Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * bugfix: naming Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * bugfix v2: naming Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * apply code review changes Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * fix TensorRTLLM build (fp8 still not supported) Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * undo refactor Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * bugfix: arguments to dist_convert Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> --------- Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Edresson Casanova <edresson1@gmail.com>
* initial commit Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * PR draft Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * fixed scaling weights Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * fixed zarr loading, added flags, refactor Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * fix expert key mapping Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * refactor Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * fix: failed test was finishing with exit code 0 Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * test commit -- rerun github checks Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * bugfix: naming Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * bugfix v2: naming Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * apply code review changes Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * fix TensorRTLLM build (fp8 still not supported) Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * undo refactor Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * bugfix: arguments to dist_convert Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> --------- Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Signed-off-by: adityavavre <aditya.vavre@gmail.com>
What does this PR do ?
Add support for exporting FP8 TE NeMo to TRT LLM.
Collection: nlp
Changelog
Usage
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information